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The i860™ Processor 



The i860 processor is a highly integrated 64-bit RISC processor. Integrated on-chip are the 
integer execution unit, pipelined floating-point adder and multiplier m£, 3-E ? mphicTuniL 
four-KByte instruction cache, eight-KByte data cache and paging unit Up to dJ5«SS 
can simultaneously take place in the integer, floating-point aider and SoJuto uST^rE 
internal instruction bus is 64 bits wide while* the internal cache busis 128 bitt wide The narauS 
architecture provides unprecedented and balanced performance kr uuegtx, tSSS^oSSiS 
graphics operations. TTie block diagram of the i860processor is stownln Ftaw l'TSiSs 
maximum operating frequency of the i860 processor is 40 MHz. mri S^^ ine current 



2 Introduction 



?,t ^fH Programs evaluate the performance of not only a given architecture but also a host of 
other nghdy coupled hardware/software constituents. Operating system! romSrf Ubraries 
system workload, memory design and VO subsystem eachphya key rcS SSSwSrowSS 
tSS^T 6 ^ 81 , ClemeatS » ^dennineVe pSSic?523?y£ 
UD^^u S ^^^;Jlifr Ce ^ aml5 ; thc Cas * ? r me i86 ° P 10 "*** since the compilers^! 
action rnet^L a t JS^ naiy f ° rm . and fif memor y su **y*em is less than op^nized. In 
aomaon, there are a few errata impacting the performance identified on the first steinine of the 
i860 processor silicon on which all the measured benchmark resStewe^obSeA^nte aU 
mi^S^f rf0rman ? detnmews * «e «60 processor still outperfonm a?oAa SvSZ 

i-«iij? • • 5* n JS w y vectonzable, the vector processing capability of the i860 nmrK™ 

vSoriS S a Si T *? *™»« *« other RISCI rcces^Tmustrl^d Ser^y the 
vectorized Unpack results. In fact, the vector performance of the i860 proceswrToDroactS that 
of the supercomputers and mainframes with vector hardware. Processor approaches that 

TwL^o 1 ™ 81106 co ^arisons are made to that of VAX 11/780 under UNIX (BSD 4 3) 
There is not a single machine for floating-point comparisons. 

3 Performance Summary 

^measured, simulated and projected perfonnance nunibers are presented m this section The 
i860 processor perfonnance is then compared with that of other prcSSswT 

3.1 i860 Processor Benchmark Results 

*£*£& wert^on^f. JS? 186 ° ^J 5 ^ « s «^arized in Table 1. All the 
oencnmarks were run on a 33.3 MHz system and the numbers for 40 MHz were calculated by 

Release 1.0 _5_ 



Table 1. i860™ Processor Benchmark Results at 40 MHz 



Benchmarks 


Measured 
Results 


Simulated 
Results 


Projected 
Results 


Dhrystone(KDhry/sec) 
Version 1 J 
Version 21 


824 
78J 


8&2 

NjV. 


90 A 
85.0 


Stanford Integer(MIPS) 


NjL 


324 


33.2 


Whetstone (M Whet/sec) 
Single Precision 
Double Precision 


30JI 
2M 


30.6 
233 


324 
25.0 


Linpack(MFLOPS) 

Fortran Double 
Coded Double 


N.A. 
N.A. . 

— ■ _ 


73 
13J 


10.0 
13-2 



32 Performance Comparisons 

sasB5s=assgfcsas , 'M , »ss 

4 Benchmarking Methodology 

U SdK^ "?T f F reSCnted <n » results *» versions 

are still quoted I KtiSuiSSffTSJ aSt^ T 5 ™ J* 1 *■?■ tends to be better and hence 
vectorizeTFo^BLl^t^prov^ * L " P-Ck "^ ** bodl "* (asscmblv > coded «* 

?£S aretan? £j TfSSH &^*^~*-«* *— ■ 

reflect what the i860 nrnr^r^U ^« Pjovme the complete performance comparisons and to 

Xsjsss^sfssgar 1 conditio,,s wm »• «■* *-*- » ««« 
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Figure 3: Summary of Floating-Point Benchmark Results 
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5.2 Compilation and Run-Time Conditions 

s^Jr^Hol^ 0S/2 ver U0 "**» Ae communications 

Host Unk software also provides similar operating environment on the simulation vYMckT 
Compilers. Compilation Optimizations are as follows: 

Benchmark CompUer Version Options 

STO" Z m mh Fortran 18 -5 Note 1 

Stanford Green Hills C 1.8.5 Note 1 

Whetstone Green Hills Fortran 1.8 J Note 2 

Lmpack Green Hills Fortran 1.8.5 Note 1 

Note 1: -OLM -X405 -X370 -X393 -X422 
Note 2: -OLM -X405 -X370 -X393 -X422 -X425 

««™£ If n ?2? S 1CSS °P? n »!« d * » work around errata on early 
stoppings. The measured performance would be negatively imoacted 
by the workarounds. However, simulation was peSvST 
workaround code for the errata. f«*u«w» wunoui 

^ZS&Z""™" Pr0Vifcd " "« """^vendors for 

ubraries is DiDdnLdT?T»^2f~ • e 2 mvatalt R"™" program containing cans lo vector 
P riof ro S „o^rc^il2Sn! ra0nZer ' S "^ " VeCt0nze *" "a**™** (ror example. UnS 

^ ri o?jF e R brf989^S USe<l I*- »-»«*- — • w*— in assembly code. Beta site 
SJ Timing Measurements and Host Link Overheads 

and are hence embedded in the meSmX ™S!^a? Host J^ 1 * *" ™P°"*k to single out 

6 Benchmark Results 
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Table 2. Dhrystone Benchmark Results 



Dhiys 


Rei 


/Sec 




1,571 


0.9 


1,757 


1.0 


3,850 


12 


6,374 


3.6 


6,423 


3.7 


6,440 


3.7 


6^96 


3.9 


7,109 


4.0 


7,140 


4.1 


7,249 


42 


7,409 


42 


7,655 


4.4 


8^09 


4.7 


9,436 


5.4 


9,920 


5.6 


10,416 


5.9 


10,787 


6.1 


11^15 


6.4 


12,639 


7.2 


13,000 


7.4 


13,157 


73 


14,109 


8.0 


14,195 


8.1 


14,820 


8.4 


15,007 


83 


15376 


8.9 


18330 


103 


19,000 


10.8 


19,800 


113 


23,430 


133 


23,700 


133 


27,400 


15.6 


28346 


16.4 


31350 


17.8 


34,000 


19.4 


35,653 


203 


42300 


24.1 


43,668 


24.9 


53,108 


302 



694)00 
82300 



393 
47.2 



System 

VAX 11/780, 43BSD [MIPS 88] 

VAX 11/780, VAX/VMS 42 (Intergraph 86] 

Sun-3/100 [Muchnick 88] 

Sun-3/260, 25MHz 68020, SunOS 32 

VAX 8600, 43BSD 

IBM 4381-2, UTS V, cc 1.11 

Intergraph InterPro 32C, SYSV R3 3.0.0, Green hilli-O 

Apollo DN4000 -O 

Sun-3/200 [Munchnkk 88] 

Convex C-l XP 6A vc 1.1 

VAX 8600, VAX/VMS in [Intergraph 86] 

Affiant FX/8 [Affiant 86] 

InterPro-326, 30MHz Clipper, Green Hills [Intergraph 86] 

Convergent Server PC, 20MHz, 80386, GreenHills 

HP 9000/840S [HP87] 

VAX 8550, VAX/VMS 43, cc 22 

VAX 8650, VAX/VMS, [Intergraph 86] 

HP 9000/840, HP-UX, full optimization 

HP 9000/825S [HP 87] 

MIPS M/500, 8MHz R2000, -03 

HP 825SRX [Sun 87] 

Sun-4/110 [Sun 88] 

Multiflow Trace 7/200 [Multiflow] 

CRAY IS 

IBM 3081, UTS SUR23, cc 13 

HP 9000/850 [HP 87] 

CRAYX-MP 

Sun-4/200, 16.7MHz SPARC [Muchnick 88], -03 

MIPS M/800, 123MHz R200, -03 

HP 835S [RISC Mgmt 88] 

MIPS M/1000, 15MHz R2000, -03 

MIPS M/120-5, 16.7MHz R2000, -03 

Amdahl 5860, UTS-V, ccl32 

IBM 3090/200 

Motorola 88000, 20MHz unknown configuration [RISC Mgmt 88] 

AMD 2900, 25MHz, 2 8K caches (simulation) [AMD 88] 

MIPS M/2000, 25MHz R3000, -03 

Amdahl 5890/300E, cc-O 

CCI Power 7/64 (simulation) [Simpson 88] 

i860 uP @ 333MHz (Measured) 

i860 uP @ 40MHz (Scaled from 33.3MHz) 



MWhe 11 ^ re 1 s 1 ) ^; Prc ^ 0nreSUltS otain '? i T ^^benchmarking system are 30.8 and 24.0 
mw net/sec respectively. Some improvements in the performance are expected with errata fixes 
and future release of the compilers and libraries. The projected single and doubleTraisSn 
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Table 4. Whetstone Benchmark Results 



DP SP 
KWIps Kwips 



410 

715 

830 

924 

U30 

1481 

1,730 

1,740 

1463 

2,092 

2,433 
2490 
2,673 
2,670 
2,907 
2,940 
3,885 
3,950 
4,000 
4,120 

4,200 
4,220 



500 
1,083 
USD 
1,039 
U50 
1^86 
1360 
2^80 
2,433 
3,115 

3421 
4,170 
3469 
4490 
4,202 
4,215 
5,663 
6,670 
6,900 
4,930 



5,430 



4,400 15,000 

6,600 

6,930 8470 

7,960 10,280 

9,100 11,400 

12,605 

13,600 17300 

14,069 

20^00 25441 
24400 30400 

25,000 
35,000 



System 

VAX 11/780, 43BSD, f77 [MIPS 881 
VAX 11/780, LLL compiler [MIPS 881 
VAX 11/780 VAX/VMS (Intergraph 861 
SUH-3/160C, 68881 [Wilson 881 
Sun-3/260, 25MHz 68020, 20MHz 68881 

^l^SS^S^ 8020 ' 25MHz 68881[Wflson 88] 
Intel 80386+80387, 20MHz, 64K cache, Greenhills 
Jitograph InterPro.32C 30MHzOipper[Intergraph 861 
Sun.3/260, 25MHz 68020, 20MHz SSl^ »J 
HP 9000/840S [HP 87] 

HP 9000/825S [HP 87) 

Intel 80386+Wcitek 1167, 20MHz, Greenhill 

Sun-3/260, Weitek FPA [Wilson 881 

VAX 8600, VAX/VMS [Intergraph 861 

HP 9000/850S [HP 871 W5qsrapB ^ 

Sun-4/110 [Sun 88] 

Sun-4/200,16.7MHz SPARC, Weitek 1164/5 rwikoa losai 

VAX 8700, VAX/VMS, FM^ ; 87J ^ 

VAX 8650, VAX/VMS [Intergraph I 86] 

Alhant FX/8 (ICE) [Alliant 86] 

Convex C-l XP [Muitiflowl 
MffSM/500 

sasf ssscrjf ■*- * *»- ■" 

MIPS M/800 

MIPS M/1000 

MIPS M/120-5 

Multiflow Trace 7/200 [Multiflow] 

MIPS M/200JW, 25MHz R3000/R3010 

CQ Power 7/64 (simulation) [Simpson 88] 

i860 uP @ 33JMHz (Measured) 

i860 uP @ 40MHz (Scaled from 333MHz) 

IBM 3090-200 [Multiflow] 
Cray X-MP/12 



S r fo^T^^^STS 1 ^ + »l d SS. Where "■ — ■»'*«— <— 
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Table 5. 100 x 100 Unpack Benchmark Results 



.41 .45 

.45 54 



OP OP 

Fortran Coded System 

10 ' Sun-3/160, 16.7MHz (Rolled BLAS) 

•" * Sun-3/260, 25MHz 68020+20MHz 68881 (Rolled BLAS) 

•13 .16 DEC Micro VAX II, VAX/VMS 

•J} " Appollo DN4000, 25MHz (68020+68881) [ENEWS 871 

•J} * VAX 11/780, 43BSD, LLL Fortran foursl 

•14 .17 VAX 11/780, VAX/VMS 

•20 80386+80387, 20MHz, 64K cache, Greenhuls 

•29 .49 Intergraph IP-32C, 30Mz Clipper {Intergraph 86] 

•38 - 80386 +Weitek 1167, 20MHz,64K cache, Greenhills 

41 Sun-3/160, Weitek FPA (Rolled BLAS) 

DEC Micro VAX 3200/3500/3600, VAX/VMS 

HP9000 Model 840S [HP 87] 

•f* ' Sun-3/260, Weitek FPa (Rolled BLAS) 

49 -66 VAX 8600, VAX/VMS 4.5 

49 -54 HP 9000/825S (HP 87J 

■51 .72 HP9000 Model 850S [HP 871 

•60 .72 MIPS M/500, f77 L21 

65 .76 VAX 8500, VAX/VMS 

• ? -96 VAX 8650, VAX/VMS 

78 IBM 9370-90, VS FORT 13.0 

•86 - Sun-4/110 [Sun 88] 

& " VAX 8550/8700/8800, VAX/VMS 

li 13 M^ M^»f 'f 7 M Si SPARC (R0Ued BLAS) W/ Weftek U64/5 < SUN 87a l 

L5 1.7 ELXSI 6420 ' 

I- 5 1-6 MIPS M/1000, f77 L31 

16 2.0 AUiant FX-1 (1 CE) [Alliant 86] 

2-1 IBM 3081K H enhanced opt-3 

2-1 22 MIPS M/120-5, f77 Ul 

30 3 3 CONVEX C-l/XP, Fort 2.0 (Rolled BLAS) 

JJ 3.9 MIPS M/2000-8 25MHz R3000/R3010 f77 131 

3.8 4.0 MIPS M/20JXW 25MHz R3000/R3010, f77 1.40 (Rolled BLAS) 

™ - Multiflow Trace 7/200 Fortran 1.4 (Rolled BLAS) 

76 11-0 Alliant FX-8, 8 CEs, FX Fortran, v2J).L9 

6d 11.0 i860 aP @ 333MHz Vector (Simulation) 

73 133 i860 uP @ 40MHz Vector (Simulation) 

12 23 CRAY IS CFT (Rolled BLAS) 

52 61 ETAIO-E (1 proc, 105ns) 

56 60 CRAY X-MP/4 CFT (Rolled BLAS) 
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